Large Margin Winnow Methods for Text Categorization
نویسنده
چکیده
ABSTRACT The SNoW (Sparse Network of Winnows) ar hite ture has re ently been su essful applied to a number of natural language pro essing (NLP) problems. In this paper, we propose large margin versions of the Winnow algorithms, whi h we argue an potentially enhan e the performan e of basi Winnows (and hen e the SNoW ar hite ture). We demonstrate that the resulting methods a hieve performan e omparable with support ve tor ma hines for text ategorization appliations. We also explain why both large margin Winnows and SVM an be suitable for NLP tasks.
منابع مشابه
Mistake-driven Learning with Thesaurus for Text Categorization
This paper extends the mistake-driven learner WINNOW to better utilize thesauri for text categorization. In our method not only words but also semantic categories given by the thesaurus are used as features in a classier. New ltering and disambiguation methods are used as pre-processing to solve the problems caused by the use of the thesaurus. In order to verify our methods, we test a large bod...
متن کاملOn the Importance of Parameter Tuning in Text Categorization
Text Categorization algorithms have a large number of parameters that determine their behaviour, whose effect is not easily predicted objectively or intuitively and may very well depend on the corpus or on the document representation. Their values are usually taken over from previously published results, which may lead to less than optimal accuracy in experimenting on particular corpora. In thi...
متن کاملCombining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering
Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods...
متن کاملAutomatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora
Office workers everywhere are drowning in email—not only spam, but also large quantities of legitimate email to be read and organized for browsing. Although there have been extensive investigations of automatic document categorization, email gives rise to a number of unique challenges, and there has been relatively little study of classifying email into folders. This paper presents an extensive...
متن کاملLarge margin multinomial mixture model for text categorization
In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000